A Practical Part-of-Speech Tagger
نویسندگان
چکیده
We present an implementation of a part-of-speech tagger based on a hidden Markov model. The methodology enables robust and accurate tagging with few resource requirements. Only a lexicon and some unlabeled training text are required. Accuracy exceeds 96%. We describe implementation strategies and optimizations which result in high-speed operation. Three applications for tagging are described: phrase recognition; word sense disambiguation; and grammatical function assignment.
منابع مشابه
Porting a Stochastic Part-of-Speech Tagger to Swedish
A b stract The Xerox Part-of-Speech Tagger (XPOST) claims to be practical. One aspect of practicality as defined here is reusability. Thus it is meant to be easy to port XPOST to a new language. To test this, XPOST was ported to Swedish. This port is described and evaluated. In previous work on part-of-speech tagging, a practical part-of-speech tagger was defined as one with the following set o...
متن کاملStudying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملThe Grammar of Sense : Using part - of - speech tags as a rst step
This paper describes two experiments: one exploring the amount of information relevant to sense disambiguation contained in the part-of-speech eld of entries in a Machine Readable Dictionary (MRD); the other, more practical, experiment attempts sense disambiguation of all content words in a text assigning MRD homographs as sense tags using only part-of-speech information. We have implemented a ...
متن کاملPart of Speech Tagging with Mixed Approaches of Neural Networks and Transformation Rules
For the purpose of constructing a practical part of speech tagger that uses as few training data as possible, an approach using neural networks, which uses di erent lengths of contexts based on longest context priority and takes into account the maximization of information amount, have been proposed so far. To further improve the tagging performance, this paper proposes an integrated approach o...
متن کاملThe Grammar of Sense : Using Part - of - Speech Tags as a Firststep
This paper describes two experiments: one exploring the amount of information relevant to sense disambiguation contained in the part-of-speech eld of entries in a Machine Readable Dictionary (MRD). Another, more practical, experiment attempts sense dis-ambiguation of all open class words in a text assigning MRD homographs as sense tags using only part-of-speech information. We have implemented ...
متن کامل